Skip to content

Conversation

@raayandhar
Copy link

Motivation

We notice that the time to import things in SGLang takes a lot of time (#10492). I have been looking into what is taking up a lot of time and if there are simple ways to help reduce this import time. From the original issue, we want to reduce:

time python -c "from sglang.srt.managers.scheduler import Scheduler"

which is what I have been focusing my efforts on. However, I think there are things we can do to reduce time for other imports. This is more of a V1 to get community feedback from experts.

Modifications

There are some heavy imports. For example, the quantization methods import at the module level is heavy. Moving some imports to the function level (only time it is used), we can reduce module import time. However, I can see how this can easily be an antipattern. In fact, it can hurt performance if we have a function that is used a lot that we have an import in. I tried to only do this in functions that we only expect to run once or a small number of times. However, I can understand the argument against this kind of code. I also don't think all the changes to hf_transformer_utils.py help so I will be taken a deeper look, since the changes are a bit invasive.

Accuracy Tests

These changes should not affect model outputs.

Benchmarking and Profiling

Running for i in {1..100}; do (time python -B -c "from sglang.srt.managers.scheduler import Scheduler") 2>&1 | grep "^real"; done | python calc_avg.py (calc_avg.py)

With these changes:

===========Timing Statistics============                                                                            
Number of runs: 100                                                                                                 
Mean:     8.308s                                                                                                    
Median:   8.236s                                                                                                    
Std Dev:  0.297s                                                                                                    
Min:      7.999s                                                                                                    
Max:      9.329s                                                                                                    
========================================

Compared to top-of-main:

===========Timing Statistics============                                                                            
Number of runs: 100                                                                                                 
Mean:     9.836s                                                                                                    
Median:   9.790s                                                                                                    
Std Dev:  0.332s                                                                                                    
Min:      8.655s                                                                                                    
Max:      11.801s                                                                                                   
======================================== 

so we have ~1.5 second improvement. Not the best, so I am going to keep working on it. I mostly targeted improving the timing in the creation of ModelConfig object. The difference so far is largely from removing the quantization import:

Without these changes import_sglang_tom.log
Screenshot 2025-11-01 at 9 03 47 PM

With these changes import_sglang_new.log
Screenshot 2025-11-01 at 9 04 04 PM

Machine:

  • AMD EPYC 7343 16-Core Processor
  • L40S GPU

Checklist

@raayandhar
Copy link
Author

I will continue working on this, there is more improvements to be made.

@hnyls2002 hnyls2002 self-assigned this Nov 3, 2025
@raayandhar raayandhar force-pushed the reduce-import-time branch 2 times, most recently from 9adc75a to 6ed8399 Compare November 6, 2025 02:47
)
from sglang.utils import LazyImport

MoeRunner = LazyImport("sglang.srt.layers.moe.moe_runner.runner", "MoeRunner")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this (LazyImport(...))is something we can apply to many other files as well, I'm not sure if there's a downside?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure. @merrymercy @fzyzcjy What are your opinions?

@raayandhar
Copy link
Author

raayandhar commented Nov 6, 2025

Hi experts,

At this point, looking at the profiling, there's been some pretty good improvement in times. Looking at time python -X importtime -c "from sglang.srt.managers.scheduler import Scheduler" 2> import_sglang.log, we started at ~6000 ms overall, but now we are down to 4000 ms; see below:
Top-of-main, newly updated (import_sglang_tom.log):
importtime (2)
This is the improved version, with my changes (import_sglang_improved.log)
importtime
I think it's best to just click the image and click again to see clearly. But in words we see an improvement of around 33%. In the improved version, most of what's left are transformers / torch imports that are basically unavoidable (without some extremely invasive changes). Otherwise a lot of the other imports have massively shrunk, i.e. model_config from 4300 ms to 550 ms, etc. You can see the logs for more details. I have some version that is super insanely optimized (just to see what's possible) that can improve it even further but the changes are really invasive and impractical.

The changes are largely just moving imports into functions so they are lazy-loaded, or moving imports to only run when we type check. Now as I've commented earlier, not all of the changes are very pretty. My rationale is the following-nearly all of the functions that I moved imports into are probably only going to run very intermittently, or even just once at object init (e.g. a lot of the functions in hf_transformers_utils.py are this way). Then, doing the lazy loading looks maybe a bit uglier but otherwise we reap good benefits for effectively no downside. I left some more comments with other thoughts of mine on how to best do this.

Also, this is largely specific to the import path and object in the issue (Scheduler). I think these changes should help other paths as well (i.e. the changes to hf_transformer_utils.py should be useful, among others). If there other paths that should be targeted let me know and I will work on it. Furthermore, I think there's a lot of free lunch for the two types of changes:

    1. For imports that are only used as types, moving them under a if TYPE_CHECKING block seems to have no downside. A lot of code seems to have this but I guess some parts don't since I was able to find these changes for this path.
    1. Using the LazyImport module when possible. This issue has been described before (Slow import #606), and this module is only used in sglang/__init__.py. It seems like there's no downside to using this (but I could be misunderstanding, please let me know), so we could be using it more broadly.

So these two changes could be used more broadly than just this path.

At this point I'm going to open to review. Not sure if this exactly tackles what the original issue was trying to get at, so appreciate any clarification on what direction this PR should go. Appreciate the time reviewers take to look at this PR!

@hnyls2002
Copy link
Collaborator

@zhyncs @merrymercy Do you think this is needed?

@zhyncs
Copy link
Member

zhyncs commented Nov 10, 2025

@zhyncs @merrymercy Do you think this is needed?

This change does not seem to break the existing changes, I think it is acceptable.

@raayandhar
Copy link
Author

If it's acceptable, I'm happy to continue applying the "free lunch" changes I mentioned in the bigger comment above. But I defer to the experts.

@zhyncs
Copy link
Member

zhyncs commented Nov 10, 2025

The drawback is that it introduces some extra cognitive load. May you fix the conflicts first? Thanks!

@raayandhar
Copy link
Author

The drawback is that it introduces some extra cognitive load. May you fix the conflicts first? Thanks!

Sure, will do.

Signed-off-by: Raayan Dhar [email protected] <[email protected]>
Signed-off-by: Raayan Dhar [email protected] <[email protected]>
@hnyls2002
Copy link
Collaborator

Please fix the conflicts. @raayandhar

Also I think this PR should be review by @merrymercy

@raayandhar
Copy link
Author

raayandhar commented Nov 16, 2025

I investigated the previous CI errors and I think they be unrelated, to my best understanding, but I will monitor carefully again.
The changes are a bit extensive across a lot of different files. I think one strategy I think of is for when importing heavy classes, using LazyImport and keep it at top-level, to avoid cognitive load (and perhaps note that the import is expensive, so that is why we do it). When importing a util function with a heavy path, if it is just used once in the whole file, keep it at the function level. And then of course when only used for type checking, leave it under aTYPE_CHECKING block. I already do the latter two, but can update my PR to do the former. I think this is close in spirit to the earlier review. I think that might be the cleanest way to handle this. Happy to discuss further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants